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Abstract 

Interaction with tumbling objects will become more common as human activities in space expand. 
Attempting to interact with a large complex object translating and rotating in space, a human operator 
using only his visual and mental capacities may not be able to estimate the object motion, plan actions 
or control those actions. 

We are developing a robot system (RAMBO) equipped with a camera, which, given a sequence of 
simple tasks, can perform these tasks on a tumbling object. RAMBO is given a complete geometric 
model of the object. A low level vision module extracts and groups characteristic features in images of 
the object. The positions of the object are determined in a sequence of images, and a motion estimate 
of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative 
locations nearby the object sufficient for achieving the tasks. 

More specifically, low level vision uses parallel algorithms for image enhancement by symmetric 
nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by “sector 
filtering”. The object pose estimation is a Hough transform method accumulating position hypotheses 
obtained by matching triples of image features (corners) to triples of model features. To maximize 
computing speed, the estimate of the position in space of a triple of features is obtained by decomposing 
its perspective view into a product of rotations and a scaled orthographic projection. This allows us 
to make use of 2D lookup tables at each stage of the decomposition. The position hypotheses for each 
possible match of model feature triples and image feature triples are calculated in parallel. Trajectory 
planning combines heuristic and dynamic programming techniques. Then trajectories are created 
using dynamic interpolations between initial and goal trajectories. All the parallel algorithms run on 
a Connection Machine CM-2 with 16K processors. 


1 Introduction 

The problem of robotic visual navigation has received considerable attention in recent years, but research has 
mostly concentrated on operations in static environments [1-11], The area of robotics in the presence of moving 
bodies has seen little activity so far [12-16]. We are developing a control system which should allow a robot 
with a camera to accomplish a sequence of actions on a moving object 

One primary application for this type of research could be the development of an autonomous vehicle able to 
develop strategies for intercepting a moving target on the ground such as another vehicle. But our approach seems 
general enough to be applied to other domains such as robotics in space. For example, a robotic arm building a 
structure in the absence of gravity might require the capability of interacting autonomously with moving objects. 
During a teleoperated assembling process one of the building elements could break loose from the gripper. Then 
the natural motion of this element would be a translating tumbling motion, and there might be very little time 
to react before the structural element is out of reach of the robot and lost in space. A human operator would 
have little chance to estimate the object motion, plan the actions required to bring the robot gripper to the right 
gripping spot and orientation along the moving element, and complete these actions. However, with the proper 
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equipment, the operator could immediately switch to a mode in which the robot is on its own for recovering the 
tumbling element in the short time available. To be able to handle such situations without human intervention 
the robot could be equipped with a video camera, and could have a database describing the geometry of all the 
types of structural elements being assembled, with the various goal points which could be reached for a proper 
gnp. While m teleoperating mode, the robot could keep track of which structural element is being handled so 
that in case of an emergency the robot would already have retrieved all the relevant information from its datable 
Analyzing a sequence of images, the robot could find the trajectory of the element in location and orientation' 
Extrapolating the trajectory to the immediate future, it would plan its own motion to bring its gripper along the 
trajectory of a goal point of the element. Along this goal trajectory the moving element appears fixed with respect 
to the gripper, so that the gripping action can be accomplished as if in a static environment 

We have set up an experimental facility which has the necessary components for testing various vision-based 
control algorithms for intercepting moving objects. These algorithms could be incoiporated into the navigation 
system of a vehicle able to intercept other vehicles, or in a robotic system able to recover objects which are 
tumbling freely in space . This facility is described in the following section. 


2 Experimental set-up 

A large American Cimflex robot arm, RAMBO, is equipped with a CCD camera and a laser pointer (Figure 1 
top left). Images from the camera are digitized and sent to the Connection Machine for processing. A smaller 
robot arm (Mitsubishi RM-501) translates and rotates an object (called the target in this paper) through space. 

everal light-sensitive diodes with focusing optics are mounted on the surface of the object. RAMBO’s goal is to 
hit a sequence of diodes on the moving object with its laser beam for given durations, possibly subject to overall 
time constraints. Electronics inside the object signal success by turning on an indicator light. Simultaneously we 
are developing a fuU computer simulation in which the camera inputs are replaced by synthetic images (Figure 1, 


3 Summary of Operations 

TTte vision-based control loop for RAMBO is shown in Figure 1. We briefly describe the functions of the different 

modules of this system from data collection to robot motion control, and refer to the sections of this paper which 
give more details. F ^ 

1. The digitizer of the video camera mounted on the robot aim can grab video frames when new visual 

information is needed. A database contains a list of positions of feature points on the target, in the local 
coordinate system of the target. 

2. A low-level vision module extracts locations of feature points from the digitized image (Section 4). 

3. An intermediate vision module finds the location/orientation of the target in the camera coordinate system 
(Section 5). 

4. Since the past camera trajectory is known, the position of the camera in the robot base coordinate system 

when the frame was grabbed is known. The location/orientation of the target is transformed to the robot 
base coordinate system (Section 6). 

5. This most recent target pose at a specific time is added to the list of target poses at previous times. In the 
Ihrget Motion Predictor, a target trajectory in location/orientation space is fitted to these past target poses 
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and extrapolated to the future to form a predicted target trajectory. We also obtain the predicted trajectories 
of goal points around the target. A goal point is a location -which is fixed in the frame of reference of the 
target, thus moving in the frame of the robot base— that one of the joints of the robot has to follow for the 
accomplishment of one of the subtasks of the total action (Section 6). 

6. From the predicted goal point trajectories, the Robot Motion Planner calculates the robot motions necessary 
for following the goal points, and the resulting camera trajectories (Sections 7, 8, 9). If the subtasks are 
not ordered, the Motion Planner finds an optimal order (Section 10). The camera trajectories are used for 
transforming subsequent target pose estimates from a camera coordinate system to an absolute coordinate 
system (Section 6). 

4 Low-level Vision 

The model-based pose estimation described in the next section requires that feature points be extracted from each of 
the images of the target. This is the task of the low-level vision module. Feature points in an image could be images 

of small holes in the target structure, comers of letters, vertices, etc 

Conversely, the geometric description of the target should contain the 3D locations of the feature points which 
are easily detected in images. 

In our experiments, the target is a polyhedra, and the feature points that we use are the vertices of the polyhedra, 
so that our image analysis is partly specific to this type of feature detection. Our image processing algorithms 
involve a sequence of basic local operations [17, 18] implemented on the Connection Machine -enhancement 
[19], edge detection, edge thinning and vertex detection- followed by some simple processing to determine which 
pairs of vertices are connected by edges in the image. This last operation proceeds as follows: 

Given k vertices, we use the processing cells in the upper-triangle of a k x k array of cells in the Connection 
Machine, each assigned to a possible edge between vertices. 

1. Enable a grid of k x k cells. Disable cells in the lower triangle of the array. 

2. Copy the addresses of comer points and the incident angles of their edges to the cells in the diagonal of 
the grid 

3. By horizontal grid-scan and vertical grid-scan, spread the incident angles and the addresses of the vertices 
from the diagonal cells along rows and columns to all the cells in the array. 

4. Each non-diagonal active cells (i,j) now has all the information about the i-th and j-th vertices; these cells 
can determine whether they have a pair of collinear incident edges. If they do, then that vertex pair is 
marked as being connected (Note that we could also count the number of edge pixels along the line joining 
the vertices, but this would be much more costly on the Connection Machine and not worthwhile for our 
purposes). 

From this algorithm we obtain a list of all the vertices with for each vertex a sublist of the vertices connected 
to it. We then produce a list of all the image triples consisting of one vertex with two vertices connected to it. 
This list of image triples is input to the intermediate-level vision module described in the next section. The other 
input is a similar list for the triples of world vertices of the target, from the geometric database describing the 
target. 
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5 Intermediate-level Vision: Pose Estimation of the Target 

The pose estimation algorithm combines three ideas: 

1. Pose estimation by matching triples of image features to triples of target features[20]. 

2. Standard camera rotations [21]. 

3. Paraperspective approximation to perspective projection [22, 23], 

This combination allows the extensive use of 2D look-up tables to replace the costly numerical computations used 

by similar previous methods [20, 24-26], The algorithm is implemented on the Connection Machine. Details are 
given in [27], 

The feature points detected in an image are grouped into triples (the image triangles). Each image triangle 
can be described by one of its vertices (the reference vertex), the length of the two adjacent sides, and the angle 
between them (the reference angle). These adjacent sides do not necessarily have to correspond to actual edges 
in the image, and all distinct triples of points could be considered, with each triple of points producing three 
such image triangles. However, if the feature points are vertices, it is useful to only consider image triangles in 
which the adjacent edges of the reference vertex are actual edges, and to only match these image triangles to 

world triangles with similar characteristics. This increases the proportion of good matches over the total number 
of possible matches. 

The main algorithm steps are as follows: 

1. Each image triangle is transformed by a standard rotation. The standard rotation corresponds to the camera 
rotation around the center of projection which brings the reference vertex of the triangle to the image center. 
For each reference vertex in the image, rotation parameters are read in a 2D lookup table. 

2. Image triangles are then rotated in the image plane around the reference vertex (located at the image center) 
to bring one edge into coincidence with the image x-axis. 

3. Once in this position, an image triangle can be described by three parameters only, the reference angle, the 
edge ratio (ratio of the lengths of the two edges adjacent to the reference vertex), and a size factor. 

4. For each image triangle/target triangle pair, a 2D lookup table can be used to determine the orientation of 
the target mangle in space, if we approximate the true perspective with a parapeispective approximaUon 
[22,23], There is one 2D lookup table per target triangle, which gives two possible orientations of this 
target triangle when the reference angle and edge ratio of its image are entered. 

5. Comparing the size of the image triangle to the size of the target triangle of known orientation, we can then 
find the distance of the target triangle from the camera lens center. 

6. The preliminary transformations of the image mangle can be reversed to obtain the actual 3D pose of the 
target triangle, the corresponding 3D position of the target center, and the image of the target center. 

7. The target center projections are clustered to identify the pose of the whole target. 

When RAMBO analyzes its first image, it does not have any « priori knowledge of which feature mangles 
are visible. In this case, the system uses all the possible combinations of target mangles and image mangles. 
However, clustering gives better results if most improper matches are removed, and it is possible to do so after a 
few consistent pose estimates of the target have been obtained. The system can also avoid considering matches 
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for target triangles which are at a nearly grazing angle with the lines of sight, since for these triangles image 
analysis is likely to perform poorly and paraperspective does not approximate true perspective well. 

For a target producing less than 16K image triangle/target triangle combinations, each pose calculation takes 
around one second on a CM-2 with 16K processors but without floating point processors. 

6 Motion Prediction 

The computation of a target position from an image gives the translation vector and rotation matrix of the target 
coordinate system in the camera coordinate system. However, the camera itself is set in motion by the robot 
arm. The trajectory of the camera in an absolute coordinate system is known, and it is straightforward to get the 
position of the camera coordinate system at the time the image was taken and to find the target position at this 
time in an absolute coordinate system. 

From a sequence of target positions, the robot must be able to predict future positions of the target in order to 
construct plans of actions. These target positions are points in six-dimensional space (three translation parameters 
and three rotation angles), each with a time label. We can fit a parametric function of time, such as a polynomial, 
to each of these sequences of coordinates. The target trajectory is then described parametrically by six functions 
of time. Calculating these functions for a future value t of the time parameter will give a predicted target position 
at this future time. 

If RAMBO is used in space and the axes of the frame of reference of the target coincide with the principal 
axes of inertia, then in the absence of external forces the translation of this coordinate system should be uniform, 
as well as the rotations around the three axes. But more complex cases could occur (for example, a structural 
element could be tethered at one end). The data base describing the target could specify what ranges and types of 
motions are possible, and this data could be used to determine the best way to parameterize the target trajectory. 

7 Task and Trajectory Planning 

In order to perform task and trajectory planning, RAMBO currently makes the simplifying assumption that a 
complex goal can be decomposed into a sequence of simple subgoals, and that each subgoal can be performed 
with one joint of the robot in a fixed position with respect to the target. This joint has to “tag along” with the 
target, thus we call this joint the tagging joint of the robot. The fixed position with respect to the target that 
the tagging joint must follow to complete a subgoal will be called a goal point. All goal points required for 
each complex action on a target can be predefined in a data base of actions specific to each target. Each goal 
point is defined by six coordinates, three for the location and three for the orientation of the tagging joint, in the 
coordinate system of the target. 

Once the tagging joint is moving along the target so that it does not move with respect to the target, the more 
distal joints can be used to perform the finer details required by the subgoal. The programming of these distal 
joints will not be considered here, since it is equivalent to programming a robot to perform a task on a fixed 
object. For example a subgoal for a robot arm on a space shuttle might be grabbing a handle on a tumbling 
satellite. A database containing the geometry of the satellite would also specify in what position — fixed in the 
satellite frame of reference — the wrist of the robot arm should be in order for the end effector to grab the handle. 
Here the tagging joint is the wrist, and the goal point is a position above the handle given in the satellite frame of 
reference. Once the wrist is positioned at the goal point — which requires constant motion control of the robot 
arm during the subgoal completion, since the satellite is tumbling — the joints of the end effector require the 
same grabbing motion with respect to the wrist as would be needed if the satellite were not moving. 
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In our experimental setup, we have concentrated on reaching the goal points. Each subgoal consists of 
illuminating a light-sensitive diode mounted on the surface of the taiget for a given duration. The source of light 
is a laser pointer mounted on the tool plate of the robot arm. Each diode is mounted inside a tube at the focal 
point of a lens which closes that tube, so that the laser beam must be roughly aligned with the optical axis of the 
lens to trigger the electronic circuits which control the output of the diodes. Thus a goal point for the laser tool is 

defined by the positions at a short distance from the lens of a diode along the optical axis, and by the orientation 
of this axis. 


8 Bringing a Tagging Joint to a Goal Point of the Target 

Suppose the original trajectory of the tagging joint in location/orienlation space is the vector p 0 (t) (Figure 2) 
The goal trajectory in location/direction space is given by the vector p g (t). At time t 0 we want the tagging 
joint to "launch” from its original trajectory p 0 (t), and to “land” at time t, = t 0 + T, on the goal trajectory 
P g {t). The reaching trajectory p T (t) should be equal to trajectory p 0 (t) at time t 0 and to trajectory p g (t) at time 
t g . The operation will last for the reaching duration T. Furthermore, the first derivatives should also be equal 
at these times, so that the velocities change smoothly when the robot departs from its original trajectory and 
reaches the goal trajectory. Once a launching time t 0 and a reaching duration T are chosen, the end points of 
the reaching trajectory p r {t), as well as the first derivatives of the reaching trajectory at these points are known. 
These boundary conditions are enough to define p r (t) in terms of a parametric cubic spline, a curve in which all 
the coefficients of the six cubic polynomials of time can be ca\c\Aaxa& 

In our experiments we have also explored an alternative method which uses a scalar piecewise quadratic 
interpolation function f T (t) which is 0 at time t 0 , 1 at time t g , with horizontal derivatives at these times, and 
continuous derivatives in the time interval. This function is expressed by 

fr(t) = 2 , 0 < t < T/2 

f T (t) = -2 +1, T/2 <t <T 

and the reaching trajectory is the interpolated trajectory given by 

Pr(t) = f T (t - t 0 )p g (t) + (1 - / r (t - t O ))p o (0 

Note that the predicted motion of the target and the predicted goal point trajectories should be updated every time 
a new target pose is found for the target. After each of these updates, the reaching trajectory of a tagging joint 
should be recomputed based on the new goal trajectory, and the present joint trajectory, which may itself be a 
reaching trajectory started after a previous update. 

9 Optimizing Reaching Trajectories 

With either method of estimating reaching trajectories, one difficult problem is the preliminary choice of T, the 
duration of the reaching trajectory. Duration T should be chosen so that the resulting linear and angular velocities 
and accelerations are within the limits imposed by the robot design. Also, the reaching trajectory should not cross 
an obstacle or the target itself, and should not require the robot to take impossible configurations. Usually, time 
is the rarest commodity, and the shortest time T compatible with the above constraints should be chosen. 

In our present simulations on a serial machine, the duration T is simply calculated as the time which would 
be necessary if the reaching trajectory followed a linear path, at a constant velocity chosen to be a safe fraction of 


256 



the maximum linear velocity of the robot Finally, we check whether this trajectory crosses robot limits or causes 
a collision with the target. If it does, the reaching trajectory is recalculated with a safe intermediary goal instead 
of the final goal point 

A better optimization of the reaching trajectory would require a choice of duration T which would set the 
velocities and accelerations along the reaching trajectory close to the limit capabilities of the robot. This can 
be done by calculating the reaching trajectories for a series of durations T , finding the maximum of the second 
derivatives along the trajectories, and identifying the trajectory with the smallest duration T which does not require 
positions, velocities and accelerations beyond the robot capabilities. On the Connection Machine, this operation 
can be done in only a few steps. We set up a 2D array of processing cells with time as the vertical dimension. 
Every column of the array contains a copy of the predicted goal trajectory, with the first cell containing the 
position of the goal at the present time in location/direction space, the next cell the position at a time increment in 
the future, and so on. Every column also contains a copy of the trajectory of the tagging joint, sampled with the 
same time increments as the goal trajectory. The difference between columns is that they use different durations 
T of the reaching trajectory, increasing from one column to the next 

Each cell computes a point of the reaching trajectory for the time t corresponding to its row and for duration 
T corresponding to its column, and then computes estimates of appropriate derivatives at its reaching trajectory 
point by communicating with its neighbors in the column. The maxima of the derivatives are computed for each 
column. The column that has the smallest duration T and for which the maxima of the positions and derivatives 
do not violate robot limits is the column which contains the desired reaching trajectory. The near-term future 
motion of the robot should be controlled based on this selected trajectory. 


10 Higher Level Planning 

In a complex action we have a set of tasks A, B, C, D, each of which requires a tagging joint to move smoothly 
to a specific goal trajectory. In some actions the order of the tasks is not specified, and we have to choose a 
good order in which to carry out the tasks. “Good” order here means one which minimizes the total time spent 
moving between goal trajectories. Notice that this is not equivalent to a travelling salesman problem, since the 
time required to move from performing, say, task B to task D, depends on when we move from B to D, therefore 
depends on the sequence of tasks which have been performed before B. 

10.1 Greedy Approach 

At any given time, we can compute all the reaching trajectories to all the goal trajectories of the remaining n 
tasks. From among these n reaching trajectories, we can choose the one with the shortest duration, and pursue 
the corresponding task. This could possibly be repeated in real time after each task is accomplished, but does not 
guarantee the best overall sequence of tasks. 

However, given that our model of the anticipated motion of the target is being updated as new information 
arrives, this procedure may make the most sense, in that there may be no point in computing a global optimal 
sequence of tasks based on a model of anticipated target motion which will not remain correct in the future. 


10.2 Exhaustive Search and Dynamic Programming 

Assuming, however, that our model of anticipated motion is accurate enough to allow a meaningful computation 
of an overall optimal sequence of tasks, one (unattractive) possibility is to compute the total time required to 
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complete all the tasks for all possible orderings of the tasks. For n tasks this will involve computing 

n! 

^ (i-1)! 

best reaching trajectories. 

A dynamic programming method has been developed which can precompute the best overall sequence of n 
tasks using computation proportional to 

hi ("-OK*- 1)! 

For four tasks, for example, this approach would require computing 32 reaching trajectories rather than the 64 
required for the exhaustive approach. 


11 Conclusions 

We have described research on robots acting in dynamic environments. We discussed the use of vision to assess 
the motion of objects relevant to the robot’s goals, and the me of particular motion prediction and planning 
techniques to accomplish these goals. We described a set of experiments currently under way involving a robot 
arm equipped with a camera and laser operating on a single moving target object equipped with light sensors. 
Finally, we detailed the parallel implementation of many of the tasks involved in the accomplishment of goals in 
dynamic environments, including parallel image processing, parallel pose estimation, and parallel planning. 
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